AITopics | implicit bias

Deep learning systems are known to exhibit implicit regularization (alt. implicit bias), favoring simple solutions instead of merely minimizing the loss function. In some cases, we can analytically derive the implicit regularization -- connecting it to an equivalent penalty that augments the learning objective. However, modern deep learning systems are complex, carrying modifications to the training procedure and architecture (e.g. early stopping, minibatching, dropout) whose effects are not always directly interpretable. Although estimating the resulting implicit regularization could aid theorists in algorithm design and practitioners in interpreting their hyperparameter choices, this problem has received little direct attention. It is also tractable: regularization makes weight updates deviate from loss gradients, promising a signal for identifying implicit bias. Here we provide gradient matching methods that can be used to empirically estimate the implicit regularization. Our method works on networks with known regularization, recovering popular explicit penalties like $\ell_1$ and $\ell_2$. It also replicates known implicit effects, like the quadratic weight penalty induced by early stopping in gradient descent, demonstrating that it can be used to test theories of implicit regularization. Crucially, because our method is empirical, it can handle implicit regularization in arbitrary networks. We demonstrate this use by characterizing the effects of dropout in deep networks, showing implicit $\ell_2$ effects in this popular method. Our work shows that practitioners can use gradient matching to understand regularization in networks with implicit biases that are too complicated to derive analytically.

artificial intelligence, machine learning, regularization, (17 more...)

arXiv.org Machine Learning

2605.05436

Country: North America > United States > Pennsylvania (0.40)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

eb189151ced0ff808abafd16a51fec92-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 04:41:06 GMT

artificial intelligence, machine learning, stepsize, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

6a14c7f9fb3f42645cfa6bd5aa446819-Paper-Conference.pdf

Neural Information Processing SystemsApr-28-2026, 10:52:54 GMT

artificial intelligence, machine learning, regime, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

Neural Information Processing SystemsApr-25-2026, 16:37:56 GMT

Modern machine learning systems such as deep neural networks are often highly over-parameterized so that they can fit the noisy training data exactly, yet they can still achieve small test errors in practice. In this paper, we study this "benign overfitting" phenomenon of the maximum margin classifier for linear classification problems. Specifically, we consider data generated from sub-Gaussian mixtures, and provide a tight risk bound for the maximum margin linear classifier in the over-parameterized setting. Our results precisely characterize the condition under which benign overfitting can occur in linear classification problems, and improve on previous work. They also have direct implications for over-parameterized logistic regression.

artificial intelligence, machine learning, population risk, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

1c26c389d60ec419fd24b5fee5b35796-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 13:48:20 GMT

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

Neural Information Processing SystemsApr-25-2026, 13:48:16 GMT

In this work, we study the implications of the implicit bias of gradient flow on generalization and adversarial robustness in ReLU networks. We focus on a setting where the data consists of clusters and the correlations between cluster means are small, and show that in two-layer ReLU networks gradient flow is biased towards solutions that generalize well, but are vulnerable to adversarial examples. Our results hold even in cases where the network is highly overparameterized. Despite the potential for harmful overfitting in such settings, we prove that the implicit bias of gradient flow prevents it. However, the implicit bias also leads to non-robust solutions (susceptible to small adversarial ℓ2-perturbations), even though robust networks that fit the data exist.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

On the Implicit Bias of Linear Equivariant Steerable Networks

Neural Information Processing SystemsApr-25-2026, 03:11:10 GMT

We study the implicit bias of gradient flow on linear equivariant steerable networks in group-invariant binary classification. Our findings reveal that the parameterized predictor converges in direction to the unique group-invariant classifier with a maximum margin defined by the input group action. Under a unitary assumption on the input representation, we establish the equivalence between steerable networks and data augmentation. Furthermore, we demonstrate the improved margin and generalization bound of steerable networks over their non-invariant counterparts.

artificial intelligence, machine learning, steerable network, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On the Implicit Bias of Linear Equivariant Steerable Networks

Neural Information Processing SystemsApr-25-2026, 03:11:06 GMT

We study the implicit bias of gradient flow on linear equivariant steerable networks in group-invariant binary classification. Our findings reveal that the parameterized predictor converges in direction to the unique group-invariant classifier with a maximum margin defined by the input group action. Under a unitary assumption on the input representation, we establish the equivalence between steerable networks and data augmentation. Furthermore, we demonstrate the improved margin and generalization bound of steerable networks over their non-invariant counterparts.

artificial intelligence, machine learning, steerable network, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.48)

Technology: